---
sidebar_position: 3
---

# How to cache chat model responses

:::info Prerequisites

This guide assumes familiarity with the following concepts:

- [Chat models](/docs/concepts/chat_models)
- [LLMs](/docs/concepts/text_llms)

:::

LangChain provides an optional caching layer for chat models. This is useful for two reasons:

It can save you money by reducing the number of API calls you make to the LLM provider, if you're often requesting the same completion multiple times.
It can speed up your application by reducing the number of API calls you make to the LLM provider.

import CodeBlock from "@theme/CodeBlock";

```typescript
import { ChatOpenAI } from "@langchain/openai";

// To make the caching really obvious, lets use a slower model.
const model = new ChatOpenAI({
  model: "gpt-4",
  cache: true,
});
```

## In Memory Cache

The default cache is stored in-memory. This means that if you restart your application, the cache will be cleared.

```typescript
console.time();

// The first time, it is not yet in cache, so it should take longer
const res = await model.invoke("Tell me a joke!");
console.log(res);

console.timeEnd();

/*
  AIMessage {
    lc_serializable: true,
    lc_kwargs: {
      content: "Why don't scientists trust atoms?\n\nBecause they make up everything!",
      additional_kwargs: { function_call: undefined, tool_calls: undefined }
    },
    lc_namespace: [ 'langchain_core', 'messages' ],
    content: "Why don't scientists trust atoms?\n\nBecause they make up everything!",
    name: undefined,
    additional_kwargs: { function_call: undefined, tool_calls: undefined }
  }
  default: 2.224s
*/
```

```typescript
console.time();

// The second time it is, so it goes faster
const res2 = await model.invoke("Tell me a joke!");
console.log(res2);

console.timeEnd();
/*
  AIMessage {
    lc_serializable: true,
    lc_kwargs: {
      content: "Why don't scientists trust atoms?\n\nBecause they make up everything!",
      additional_kwargs: { function_call: undefined, tool_calls: undefined }
    },
    lc_namespace: [ 'langchain_core', 'messages' ],
    content: "Why don't scientists trust atoms?\n\nBecause they make up everything!",
    name: undefined,
    additional_kwargs: { function_call: undefined, tool_calls: undefined }
  }
  default: 181.98ms
*/
```

## Caching with Redis

LangChain also provides a Redis-based cache. This is useful if you want to share the cache across multiple processes or servers.
To use it, you'll need to install the `redis` package:

```bash npm2yarn
npm install ioredis @langchain/community @langchain/core
```

Then, you can pass a `cache` option when you instantiate the LLM. For example:

import RedisCacheExample from "@examples/cache/chat_models/redis.ts";

<CodeBlock language="typescript">{RedisCacheExample}</CodeBlock>

## Caching with Upstash Redis

LangChain provides an Upstash Redis-based cache. Like the Redis-based cache, this cache is useful if you want to share the cache across multiple processes or servers. The Upstash Redis client uses HTTP and supports edge environments. To use it, you'll need to install the `@upstash/redis` package:

```bash npm2yarn
npm install @upstash/redis
```

You'll also need an [Upstash account](https://docs.upstash.com/redis#create-account) and a [Redis database](https://docs.upstash.com/redis#create-a-database) to connect to. Once you've done that, retrieve your REST URL and REST token.

Then, you can pass a `cache` option when you instantiate the LLM. For example:

import UpstashRedisCacheExample from "@examples/cache/chat_models/upstash_redis.ts";

<CodeBlock language="typescript">{UpstashRedisCacheExample}</CodeBlock>

You can also directly pass in a previously created [@upstash/redis](https://docs.upstash.com/redis/sdks/javascriptsdk/overview) client instance:

import AdvancedUpstashRedisCacheExample from "@examples/cache/chat_models/upstash_redis_advanced.ts";

<CodeBlock language="typescript">{AdvancedUpstashRedisCacheExample}</CodeBlock>

## Caching with Vercel KV

LangChain provides an Vercel KV-based cache. Like the Redis-based cache, this cache is useful if you want to share the cache across multiple processes or servers. The Vercel KV client uses HTTP and supports edge environments. To use it, you'll need to install the `@vercel/kv` package:

```bash npm2yarn
npm install @vercel/kv
```

You'll also need an Vercel account and a [KV database](https://vercel.com/docs/storage/vercel-kv/kv-reference) to connect to. Once you've done that, retrieve your REST URL and REST token.

Then, you can pass a `cache` option when you instantiate the LLM. For example:

import VercelKVCacheExample from "@examples/cache/chat_models/vercel_kv.ts";

<CodeBlock language="typescript">{VercelKVCacheExample}</CodeBlock>

## Caching with Cloudflare KV

:::info
This integration is only supported in Cloudflare Workers.
:::

If you're deploying your project as a Cloudflare Worker, you can use LangChain's Cloudflare KV-powered LLM cache.

For information on how to set up KV in Cloudflare, see [the official documentation](https://developers.cloudflare.com/kv/).

**Note:** If you are using TypeScript, you may need to install types if they aren't already present:

```bash npm2yarn
npm install -S @cloudflare/workers-types
```

import CloudflareExample from "@examples/cache/chat_models/cloudflare_kv.ts";

<CodeBlock language="typescript">{CloudflareExample}</CodeBlock>

## Caching on the File System

:::warning
This cache is not recommended for production use. It is only intended for local development.
:::

LangChain provides a simple file system cache.
By default the cache is stored a temporary directory, but you can specify a custom directory if you want.

```typescript
const cache = await LocalFileCache.create();
```

## Next steps

You've now learned how to cache model responses to save time and money.

Next, check out the other how-to guides on chat models, like [how to get a model to return structured output](/docs/how_to/structured_output) or [how to create your own custom chat model](/docs/how_to/custom_chat).
